After running your Pylearn2 models, it's probably not best to compare them on the score they get on the validation set, as that is used in the training process; so could be the victim of overfitting. It would be better to run the model over the test set, which is supposed to be a holdout set used to compare models. We could rerun all our models with a monitor on this value, but for models we've already run, it might be more useful to be able to pull out this value for just that pickle.

This is likely to be wasted effort, because it seems like the kind of thing that should already exist in Pylearn2. Unfortunately, since I can't find it and it seems fairly simple to implement I'm just going to go ahead and write it.

Hopefully, this will also help us figure out what's going wrong with some submissions, that turn out to be incredibly bad; for example, those using augmentation.


In [1]:
import pylearn2.utils
import pylearn2.config
import theano
import neukrill_net.dense_dataset
import neukrill_net.utils
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import holoviews as hl
%load_ext holoviews.ipython


Using gpu device 0: Tesla K40c
:0: FutureWarning: IPython widgets are experimental and may change in the future.
Welcome to the HoloViews IPython extension! (http://ioam.github.io/holoviews/)
Available magics: %compositor, %opts, %params, %view, %%labels, %%opts, %%view
<matplotlib.figure.Figure at 0x7f81439a72d0>
<matplotlib.figure.Figure at 0x7f81439a7c10>
<matplotlib.figure.Figure at 0x7f81439a7a10>

Loading data and model

Initialise, loading the settings and the test dataset we're going to be using:


In [2]:
cd ..


/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-work

In [232]:
settings = neukrill_net.utils.Settings("settings.json")
run_settings = neukrill_net.utils.load_run_settings(
    "run_settings/alexnet_based_norm_global_8aug.json", settings, force=True)

In [233]:
%%time
# loading the model
model = pylearn2.utils.serial.load(run_settings['pickle abspath'])


CPU times: user 5.52 s, sys: 98 ms, total: 5.62 s
Wall time: 5.67 s

In [234]:
reload(neukrill_net.dense_dataset)


Out[234]:
<module 'neukrill_net.dense_dataset' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/dense_dataset.pyc'>

In [235]:
%%time
# loading the data
dataset = neukrill_net.dense_dataset.DensePNGDataset(settings_path=run_settings['settings_path'],
                                            run_settings=run_settings['run_settings_path'],
                                                     train_or_predict='train',
                                                     training_set_mode='test', force=True)


CPU times: user 25.7 s, sys: 592 ms, total: 26.3 s
Wall time: 28.6 s

Setting up forward pass

Now we've loaded the data and the model we're going to set up a forward pass through the data in the same way we do it in the test.py script: pick a batch size, compile a Theano function and then iterate over the whole dataset in batches, filling an array of predictions.


In [236]:
# find allowed batch size over 1000 (want big batches)
# (Theano has to have fixed batch size and we don't want leftover)
batch_size=1000
while dataset.X.shape[0]%batch_size != 0:
    batch_size += 1

In [237]:
n_batches = int(dataset.X.shape[0]/batch_size)

In [238]:
%%time
# set this batch size
model.set_batch_size(batch_size)
# compile Theano function
X = model.get_input_space().make_batch_theano()
Y = model.fprop(X)
f = theano.function([X],Y)


CPU times: user 1.44 s, sys: 49 ms, total: 1.49 s
Wall time: 1.94 s

Compute probabilities

The following is the same as the code in test.py that applies the processing.


In [239]:
%%time
y = np.zeros((dataset.X.shape[0],len(settings.classes)))
for i in xrange(n_batches):
    print("Batch {0} of {1}".format(i+1,n_batches))
    x_arg = dataset.X[i*batch_size:(i+1)*batch_size,:]
    if X.ndim > 2:
        x_arg = dataset.get_topological_view(x_arg)
    y[i*batch_size:(i+1)*batch_size,:] = (f(x_arg.astype(X.dtype).T))


Batch 1 of 8
Batch 2 of 8
Batch 3 of 8
Batch 4 of 8
Batch 5 of 8
Batch 6 of 8
Batch 7 of 8
Batch 8 of 8
CPU times: user 5.43 s, sys: 5.94 s, total: 11.4 s
Wall time: 11.4 s

In [240]:
plt.scatter(np.where(y == 0)[1],np.where(y==0)[0])


Out[240]:
<matplotlib.collections.PathCollection at 0x7f8143001390>

Of course, it's strange that there are any zeros at all. Hopefully they'll go away when we start averaging.

Score before averaging

We can score the model before averaging by just using the class labels as they were going to be used for training. Using Sklearn's utility for calculating log_loss:


In [241]:
import sklearn.metrics

In [242]:
sklearn.metrics.log_loss(dataset.y,y)


Out[242]:
0.9356949090306742

Score after averaging

In test.py we take the least intelligent approach to dealing with averaging over the different augmented versions. Basically, we just assume that whatever the augmentation factor is, the labels must repeat over that step size, so we can just collapse those into a single vector of probabilities.

First, we should check that assumption:


In [245]:
# augmentation factor
af = 8

In [246]:
for low,high in zip(range(0,dataset.y.shape[0],af),range(af,dataset.y.shape[0]+af,af)):
    first = dataset.y[low][0]
    if any(first != i for i in dataset.y[low:high].ravel()):
        print("Labels do not match at:", (low,high))
        break

In [247]:
y_collapsed = np.zeros((int(dataset.X.shape[0]/af), len(settings.classes))) 
for i,(low,high) in enumerate(zip(range(0,dataset.y.shape[0],af),
                                  range(af,dataset.y.shape[0]+af,af))):
    y_collapsed[i,:] = np.mean(y[low:high,:], axis=0)

In [248]:
plt.scatter(np.where(y_collapsed == 0)[1],np.where(y_collapsed == 0)[0])


Out[248]:
<matplotlib.collections.PathCollection at 0x7f814259d0d0>

There are no zeros in there now!


In [249]:
labels_collapsed = dataset.y[range(0,dataset.y.shape[0],af)]

In [250]:
labels_collapsed.shape


Out[250]:
(3089, 1)

In [251]:
sklearn.metrics.log_loss(labels_collapsed,y_collapsed)


Out[251]:
0.80749745153937036

That's pretty much exactly what we got on the leaderboard.